Downsample waterlevel trends to daily min depth-to-water by jirhiker · Pull Request #87 · DataIntegrationGroup/DataIntegrationEngine

jirhiker · 2026-06-26T21:27:49Z

Problem

The nm_waterlevel_trends combine step crashed its child process at the native level — ChildProcessCrashException with no Python traceback (OOM kill). Root cause: pymannkendall.original_test is O(n²), and high-frequency wells (continuous loggers — tens of thousands of readings) blew up memory/CPU running the per-well test.

Fix

Before the trend test, downsample each well's observations to one point per calendar day, keeping the daily minimum depth-to-water (the shallowest reading) — _daily_min_series. This bounds the Mann-Kendall cost and removes within-day sampling noise. A well measured continuously for years collapses from ~10⁴–10⁵ points to ~10³ daily points.

Changes

_daily_min_series(obs_list) — groups by UTC calendar day, keeps min DTW per day at the day's midnight epoch; returns (raw_count, sorted_daily_pairs).
Trend dumper uses it: record_count is now the daily point count used for the trend; new observation_count carries the raw reading count.
Qualification gate + Mann-Kendall/Theil-Sen classification unchanged (now applied to the daily series).
Updated TREND_METHOD_DESCRIPTION to document the daily-min step.
New test: same-day readings collapse to the min; observation_count vs record_count.

Verification

14 persister tests pass. Independent of #86 (touches only ogc_features.py + its test).

🤖 Generated with Claude Code

High-frequency wells (e.g. continuous loggers, tens of thousands of readings) made the per-well Mann-Kendall test (O(n^2)) blow up memory and get the combine child process OOM-killed (ChildProcessCrashException, no Python traceback). Before the trend test, reduce each well's observations to one point per calendar day keeping the daily minimum depth-to-water (the shallowest reading), via _daily_min_series. This bounds the MK cost and removes within-day sampling noise. - record_count is now the daily-point count used for the trend; new observation_count carries the raw reading count. - qualification gate and classification unchanged (applied to daily points); updated TREND_METHOD_DESCRIPTION. - test: daily-min downsampling (same-day readings collapse to the min). 14 persister tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

github-actions · 2026-06-26T21:28:19Z

Your pull request is automatically being deployed to Dagster Cloud.

Location	Status	Link	Updated
`die-orchestration`		View in Cloud	Jun 26, 2026 at 09:32 PM (UTC)

The trend combine rebuilt every observation into a ParameterRecord (and each site into a SiteRecord) before computing — millions of objects for statewide water-level data, on top of the already-large pickled inputs. Consume the payload dicts directly: dump_waterlevel_trend_collection and _daily_min_series now read dicts (obs.get / site.get) instead of getattr on record objects, and the combine asset passes all_sites/all_timeseries straight through. Cuts peak memory in the step that was OOM-crashing. Other dumpers (summary, major-chemistry, timeseries) still use record objects. Trend test helpers now build dicts. 14 persister tests pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

jirhiker merged commit b9ea78c into main Jun 26, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Downsample waterlevel trends to daily min depth-to-water#87

Downsample waterlevel trends to daily min depth-to-water#87
jirhiker merged 2 commits into
mainfrom
fix/waterlevel-trend-daily-downsample

jirhiker commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jirhiker commented Jun 26, 2026

Problem

Fix

Changes

Verification

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 26, 2026 •

edited

Loading